Intrinsic Plagiarism Analysis with Meta Learning

نویسندگان

  • Benno Stein
  • Sven Meyer zu Eissen
چکیده

In intrinsic plagiarism analysis we are given a document, allegedly written by a single author, and the task is to find sufficient evidence either to accept or to reject this hypothesis. Existing research to intrinsic plagiarism analysis tries to quantify changes in the writing style by analyzing the distributions of particular style markers. This way, acceptable detection rates can be achieved if the portion of plagiarized sections is known a-priori and if the document is of a single genre. However, both assumptions may not be fulfilled in practice. In [6] Koppel and Schler propose a new approach to the authorship verification problem, where the task is to determine whether two texts are written by the same author. Their approach is ingenious in that it provides a means to detect relatively shallow differences in writing style while being independent of language, period, and genre. Since the approach requires two (relatively large) samples of text to be compared to each other it cannot be applied directly to the intrinsic plagiarism analysis problem. Main contribution of our paper is the idea to address the shortcomings of existing approaches to intrinsic plagiarism analysis with the technology presented in [6]. We propose a hybrid approach that employs style marker analysis for the purpose of hypotheses generation which then are accepted or rejected by an authorship verification analysis. A second contribution of our paper is the evaluation of style markers for German text and their application to a real-world plagiarism case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intrinsic plagiarism analysis

Research in automatic text plagiarism detection focuses on algorithms that compare suspicious documents against a collection of reference documents. Recent approaches perform well in identifying copied or modified foreign sections, but they assume a closed world where a reference collection is given. This article investigates the question whether plagiarism can be detected by a computer program...

متن کامل

Intrinsic Plagiarism Detection using Complexity Analysis

We introduce Kolmogorov Complexity measures as a way of extracting structural information from texts for Intrinsic Plagiarism Detection. Kolmogorov complexity measures have been used as features in a variety of machine learning tasks including image recognition, radar signal classification, EEG classification, DNA analysis, speech recognition and some text classification tasks (Chi and Kong, 19...

متن کامل

Scientists Admitting to Plagiarism: A Meta-analysis of Surveys

We conducted a systematic review and meta-analysis of anonymous surveys asking scientists whether they ever committed various forms of plagiarism. From May to December 2011 we searched 35 bibliographic databases, five grey literature databases and hand searched nine journals for potentially relevant studies. We included surveys that asked scientists if, in a given recall period, they had commit...

متن کامل

Linguistic and Statistical Traits Characterising Plagiarism

This paper investigates the problem of distinguishing between original and rewritten text materials, with focus on the application of plagiarism detection. The hypothesis is that original texts and rewritten texts exhibit significant and measurable differences, and that these can be captured through statistical and linguistic indicators. We propose and analyse a number of these indicators (incl...

متن کامل

Approaches for Intrinsic and External Plagiarism Detection - Notebook for PAN at CLEF 2011

Plagiarism detection has been considered as a classification problem which can be approximated with intrinsic strategies, considering self-based information from a given document, and external strategies, considering comparison techniques between a suspicious document and different sources. In this work, both intrinsic and external approaches for plagiarism detection are presented. First, the m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007